A formant modification method for improved ASR of children’s speech
نویسندگان
چکیده
Differences in acoustic characteristics between children’s and adults’ speech degrade performance of automatic recognition systems when trained using are used to recognize speech. This degradation is due the mismatch training testing. One main sources difference vocal tract resonances (formant frequencies) adult child speakers. The present study aims reduce formant frequencies by modifying formants better correspond carried out warping linear prediction (LP) spectrum computed from warped LP spectra a frame-based manner with corresponding residuals synthesize whose structure closer that When testing an ASR system speech, reduces spectral improves Experiments were conducted narrowband (8 kHz) wideband (16 speakers WSJCAM0 PF_STAR databases, respectively, recognizing models proposed method gave relative improvements 24% 11% for DNN TDNN models, For technique 27% 13% respectively. was also compared two speaker adaptation methods: length normalization (VTLN) speaking rate (SRA). comparison showed best method. We combined VTLN SRA, found further reduction WER. Moreover, our experiments noisy various types additive noise signal-to-noise ratios performs well degraded
منابع مشابه
Evaluation of formant-like features for ASR
This paper investigates possibilities to automatically find a low-dimensional, formant-related physical representation of the speech signal, which is suitable for automatic speech recognition (ASR). This aim is motivated by the fact that formants have been shown to be discriminant features for ASR. Combinations of automatically extracted formant-like features and ‘conventional’, noiserobust, st...
متن کاملParallel combination of speech streams for improved ASR
In a growing number of applications, such as simultaneous interpretation, audio or text may be available conveying the same information in different languages. These different views contain redundant information that can be explored to enhance the performance of speech and language processing applications. We propose a method that directly integrates ASR word graphs or lattices and phrase table...
متن کاملA Novel Speech/Noise Discrimination Method for Embedded ASR System
The problem of speech/noise discrimination has become increasingly important as the automatic speech recognition (ASR) system is applied in the real world. Robustness and simplicity are two challenges to the speech/noise discrimination method for an embedded system. The energy-based feature is the most suitable and applicable feature for speech/noise discrimination for embedded ASR system becau...
متن کاملImproved modelling of speech dynamics using non-linear formant trajectories for HMM-based speech synthesis
This paper describes the use of non-linear formant trajectories to model speech dynamics. The performance of the non-linear formant dynamics model is evaluated using HMM-based speech synthesis experiments, in which the 12 dimensional parallel formant synthesiser control parameters and their time derivatives are used as the feature vectors in the HMM. Two types of formant synthesiser control par...
متن کاملSpeech formant frequency estimation: evaluating a nonstationary analysis method
The objective of this paper is to critically evaluate the performance of a nonstationary analysis method in tracking speech formant frequencies as they change with time due to the natural variations in the vocal-tract system during speech production. The method of instantaneous frequency estimation is applied to the tracking of speech formant frequencies to observe the time variations in the vo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Speech Communication
سال: 2022
ISSN: ['1872-7182', '0167-6393']
DOI: https://doi.org/10.1016/j.specom.2021.11.003